Detection and Correction of Non-Words in Arabic: a Hybrid Approach

نویسندگان

  • Bassam Haddad
  • Mustafa Yaseen
چکیده

As Arabic is known for its highly inflectional morphological structure, this hybrid approach is utilizing morphological knowledge in form of consistent rootpattern relationships, and some morpho-syntactical knowledge based on affixation and morphographemic rules to specify the word recognition and nonword correction process. Furthermore this paper is proposing novel probabilistic measures for completing the task of the correction by locating, reducing and ranking of the most probable correction candidates in Arabic derivative words. In this context based on frequency of occurrence analysis, two probabilistic measures are introduced, Root-Pattern Predictive Value, RPV, and PatternRoot Predictive Value, PPV. Moreover, keyboard effect, letter sound and similarity are considered in addition to some lexical features as a supplementary aid to improve the process of error detection and correction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

نقد کتاب پژوهشی (ادبیــات) /به فرهنگ باشد روان تندرست: نقدی بر کتاب فرهنگ واره لغات و ترکیبات عربی شاهنامه، هوشنگ محمدی افشار

The latest comprehensive and detailed research on the recognition, description, and the etymology of the Arabic lexicon of Shahnameh is the dictionary of Arabic words and Expressions of Shahnameh, written by Dr. Sajjad Aydanlou. This book is based on the second edition of the Correction of the Khaleghi Motlagh Shahnameh (1393) which is the most authoritative correction and the closest to the or...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

Intrusion Detection based on a Novel Hybrid Learning Approach

Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...

متن کامل

A hybridization of evolutionary fuzzy systems and ant Colony optimization for intrusion detection

A hybrid approach for intrusion detection in computer networks is presented in this paper. The proposed approach combines an evolutionary-based fuzzy system with an Ant Colony Optimization procedure to generate high-quality fuzzy-classification rules. We applied our hybrid learning approach to network security and validated it using the DARPA KDD-Cup99 benchmark data set. The results indicate t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Proc. Oriental Lang.

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2007